We start by importing the relevant packages, including netzoor
library(netZooR)
Loading required package: igraph
Warning: package ‘igraph’ was built under R version 4.3.3
Attaching package: ‘igraph’
The following objects are masked from ‘package:stats’:
decompose, spectrum
The following object is masked from ‘package:base’:
union
Loading required package: reticulate
Warning: package ‘reticulate’ was built under R version 4.3.3Loading required package: pandaR
Loading required package: Biobase
Loading required package: BiocGenerics
Attaching package: ‘BiocGenerics’
The following objects are masked from ‘package:igraph’:
normalize, path, union
The following objects are masked from ‘package:stats’:
IQR, mad, sd, var, xtabs
The following objects are masked from ‘package:base’:
anyDuplicated, aperm, append, as.data.frame, basename,
cbind, colnames, dirname, do.call, duplicated, eval,
evalq, Filter, Find, get, grep, grepl, intersect,
is.unsorted, lapply, Map, mapply, match, mget, order,
paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
rbind, Reduce, rownames, sapply, setdiff, sort, table,
tapply, union, unique, unsplit, which.max, which.min
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages
'citation("pkgname")'.
Setting options('download.file.method.GEOquery'='auto')
Setting options('GEOquery.inmemory.gpl'=FALSE)
library(tidyverse)
Warning: package ‘purrr’ was built under R version 4.3.3Warning: package ‘lubridate’ was built under R version 4.3.3── Attaching core tidyverse packages ─────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.4 ── Conflicts ───────────────────────────────── tidyverse_conflicts() ──
✖ lubridate::%--%() masks igraph::%--%()
✖ dplyr::as_data_frame() masks tibble::as_data_frame(), igraph::as_data_frame()
✖ dplyr::combine() masks Biobase::combine(), BiocGenerics::combine()
✖ purrr::compose() masks igraph::compose()
✖ tidyr::crossing() masks igraph::crossing()
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
✖ ggplot2::Position() masks BiocGenerics::Position(), base::Position()
✖ purrr::simplify() masks igraph::simplify()
ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors
# Change to your organism
library(clusterProfiler)
Warning: package ‘clusterProfiler’ was built under R version 4.3.3clusterProfiler v4.10.1 For help: https://yulab-smu.top/biomedical-knowledge-mining-book/
If you use clusterProfiler in published research, please cite:
T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou, W Tang, L Zhan, X Fu, S Liu, X Bo, and G Yu. clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. The Innovation. 2021, 2(3):100141
Attaching package: ‘clusterProfiler’
The following object is masked from ‘package:purrr’:
simplify
The following object is masked from ‘package:igraph’:
simplify
The following object is masked from ‘package:stats’:
filter
library(purrr)
library(org.Hs.eg.db) # Human gene annotation database
Loading required package: AnnotationDbi
Loading required package: stats4
Loading required package: IRanges
Loading required package: S4Vectors
Attaching package: ‘S4Vectors’
The following object is masked from ‘package:clusterProfiler’:
rename
The following objects are masked from ‘package:lubridate’:
second, second<-
The following objects are masked from ‘package:dplyr’:
first, rename
The following object is masked from ‘package:tidyr’:
expand
The following object is masked from ‘package:utils’:
findMatches
The following objects are masked from ‘package:base’:
expand.grid, I, unname
Attaching package: ‘IRanges’
The following object is masked from ‘package:clusterProfiler’:
slice
The following object is masked from ‘package:lubridate’:
%within%
The following objects are masked from ‘package:dplyr’:
collapse, desc, slice
The following object is masked from ‘package:purrr’:
reduce
Attaching package: ‘AnnotationDbi’
The following object is masked from ‘package:clusterProfiler’:
select
The following object is masked from ‘package:dplyr’:
select
library(RColorBrewer)
Use the ALPACA algorithm to infer the network for each COAD subtype. You can use the run_alpaca.r script to run ALPACA between two networks and you have three outputs: the membership table, the alpaca object and the list of top genes.
Now we will read the membership table for the CMS2 and CMS4 subtypes.
outputDir <- "../results/alpaca-batch-coad-subtype-connnectivity-20250317/"
We read the membership table for the CMS2 and CMS4 subtypes.
# Read back the membership table
membership_cms2_cms4_tibble <- read_csv("../data/processed/alpaca-batch-coad-subtype-20240510/membership_csm2_csm4.csv")
Rows: 20646 Columns: 3── Column specification ───────────────────────────────────────────────
Delimiter: ","
chr (1): node
dbl (2): module, modularity
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
membership_cms2_cms4_tibble
And now we want to check the number of nodes in each module.
count_total <- membership_cms2_cms4_tibble %>%
count(module, name = "node_count")
# Count nodes ending in _A (tf)
count_A <- membership_cms2_cms4_tibble %>%
filter(grepl("_A$", node)) %>%
count(module, name = "count_tf")
# Count nodes ending in _B (genes)
count_B <- membership_cms2_cms4_tibble %>%
filter(grepl("_B$", node)) %>%
count(module, name = "count_gene")
# Join the two counts
count_A_B <- full_join(count_A, count_B, by = "module")
count_A_B <- full_join(count_A_B, count_total, by = "module")
# SAve the table
write_csv(count_A_B, file = paste(outputDir, 'cms2_cms4_nodes_per_cluster.csv', sep = ""))
count_A_B
NA
We also plot the Number of nodes in each module
# draw histogram of the percentage of nodes in each module
count_A_B %>%
dplyr::select(module, count_tf, count_gene) %>%
gather(key = "type", value = "count", -module) %>%
ggplot(aes(x = module, y = count, fill = type)) +
geom_bar(stat = "identity", position = "dodge") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
labs(x = "Module", y = "Number of nodes", fill = "Type")+
# change colors to Set2
scale_fill_brewer(palette = "Set1") +
theme(legend.position = "top")
ggsave(file = paste(outputDir, 'cms2_cms4_nodes_per_cluster.pdf', sep = ""), width = 6, height = 4)
Check the percentage of nodes in each module. This shows how big
compared to the total number of nodes in the network are the nodes in
each module.
We also normalize for the number of tf and genes, such that TFs and
genes are comparable.
# do the same but with percentage respective to the total number of A nodes and B nodes
# fill NA with 0
count_A_B[is.na(count_A_B)] <- 0
count_A_B$percentage_tf = count_A_B$count_tf / sum(count_A_B$count_tf) * 100
count_A_B$percentage_gene = count_A_B$count_gene / sum(count_A_B$count_gene) * 100
count_A_B %>%
dplyr::select(module, percentage_tf, percentage_gene) %>%
gather(key = "type", value = "percentage", -module) %>%
ggplot(aes(x = factor(module), y = percentage, fill = type)) +
# show all xlabel
geom_bar(stat = "identity", position = "dodge") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 0, hjust = 1)) +
labs(x = "Module", y = "Number of nodes", fill = "Type")+
# change colors to Set2
scale_fill_brewer(palette = "Set1") +
theme(legend.position = "top")
ggsave(file = paste(outputDir, 'cms2_cms4_percentage_nodes_per_cluster.pdf', sep = ""), width = 6, height = 4)
We also reformat the membership table to have the node name and the type in separate columns.
# Split node column in the node_name and the type split by "_"
membership_cms2_cms4_tibble <- membership_cms2_cms4_tibble %>%
separate(node, into = c("node_name", "type"), sep = "_", remove = FALSE)
membership_cms2_cms4_tibble
library(dplyr)
library(stringr)
df_grouped_string <- membership_cms2_cms4_tibble %>%
filter(grepl("_B$", node)) %>%
mutate(node = str_remove(node, "_B")) %>%
group_by(module) %>%
summarise(nodes = list(node))
tf_grouped_string <- membership_cms2_cms4_tibble %>%
filter(grepl("_A$", node)) %>%
mutate(node = str_remove(node, "_A")) %>%
group_by(module) %>%
summarise(nodes = list(node))
Here are the functions to run GO/KEGG/REACTOME analysis on the genes in each module
# Function to perform enrichment analysis per module
perform_enrichment <- function(genes, module) {
# Convert ENSEMBL to ENTREZ
genes_entrez <- bitr(genes, fromType="ENSEMBL", toType="ENTREZID", OrgDb=org.Hs.eg.db)
# If no valid ENTREZ IDs are found, return NULL
if (nrow(genes_entrez) == 0) return(NULL)
# Perform GO enrichment analysis
ego <- enrichGO(
gene = genes_entrez$ENTREZID,
OrgDb = org.Hs.eg.db,
keyType = "ENTREZID",
ont = "BP", # Biological Process
pAdjustMethod = "BH",
pvalueCutoff = 1,
qvalueCutoff = 1
)
# Add module information and return results as a dataframe
if (!is.null(ego)) {
ego_df <- as.data.frame(ego)
ego_df$module <- module # Add module ID
return(ego_df)
} else {
print('No valid ENTREZ IDs found for module')
return(NULL)
}
}
# Function to perform enrichment analysis per module
perform_kegg_enrichment <- function(genes, module) {
# Convert ENSEMBL to ENTREZ
genes_entrez <- bitr(genes, fromType="ENSEMBL", toType="ENTREZID", OrgDb=org.Hs.eg.db)
# If no valid ENTREZ IDs are found, return NULL
if (nrow(genes_entrez) == 0) return(NULL)
# Perform GO enrichment analysis
ego <- enrichKEGG(
gene = genes_entrez$ENTREZID,
organism = 'hsa',
pAdjustMethod = "BH",
pvalueCutoff = 1,
qvalueCutoff = 1
)
# Add module information and return results as a dataframe
if (!is.null(ego)) {
ego_df <- as.data.frame(ego)
ego_df$module <- module # Add module ID
return(c(ego_df))
} else {
print('No valid ENTREZ IDs found for module')
return(NULL)
}
}
library(ReactomePA)
ReactomePA v1.46.0 For help: https://yulab-smu.top/biomedical-knowledge-mining-book/
If you use ReactomePA in published research, please cite:
Guangchuang Yu, Qing-Yu He. ReactomePA: an R/Bioconductor package for reactome pathway analysis and visualization. Molecular BioSystems 2016, 12(2):477-479
# Function to perform enrichment analysis per module
perform_reactome_enrichment <- function(genes, module) {
# Convert ENSEMBL to ENTREZ
genes_entrez <- bitr(genes, fromType="ENSEMBL", toType="ENTREZID", OrgDb=org.Hs.eg.db)
# If no valid ENTREZ IDs are found, return NULL
if (nrow(genes_entrez) == 0) return(NULL)
ego <- enrichPathway(gene=genes_entrez$ENTREZID,
pAdjustMethod = "BH",
pvalueCutoff = 1,
qvalueCutoff = 1
, readable=TRUE)
# Add module information and return results as a dataframe
if (!is.null(ego)) {
ego_df <- as.data.frame(ego)
ego_df$module <- module # Add module ID
return(ego_df)
} else {
print('No valid ENTREZ IDs found for module')
return(NULL)
}
}
# Iterated over the list of genes in each module and perform pathway analysis and concatenate the results table
# Iterate over each module and perform enrichment analysis
all_results <- map_dfr(df_grouped_string$module, function(module) {
genes <- unlist(df_grouped_string$nodes[df_grouped_string$module == module])
if (length(genes)>10){
print(paste('module',module))
perform_enrichment(genes, module)
} else {
return(NULL)
}
})
[1] "module 1"
'select()' returned 1:many mapping between keys and columns
Warning: 17.97% of input gene IDs are fail to map...
[1] "module 2"
'select()' returned 1:many mapping between keys and columns
Warning: 11.46% of input gene IDs are fail to map...
[1] "module 3"
'select()' returned 1:many mapping between keys and columns
Warning: 9.57% of input gene IDs are fail to map...
[1] "module 4"
'select()' returned 1:many mapping between keys and columns
Warning: 14.98% of input gene IDs are fail to map...
[1] "module 5"
'select()' returned 1:many mapping between keys and columns
Warning: 9.61% of input gene IDs are fail to map...
[1] "module 6"
'select()' returned 1:many mapping between keys and columns
Warning: 14.55% of input gene IDs are fail to map...
[1] "module 7"
'select()' returned 1:many mapping between keys and columns
Warning: 15.01% of input gene IDs are fail to map...
[1] "module 8"
'select()' returned 1:many mapping between keys and columns
Warning: 28.94% of input gene IDs are fail to map...
[1] "module 9"
'select()' returned 1:many mapping between keys and columns
Warning: 13.97% of input gene IDs are fail to map...
[1] "module 10"
'select()' returned 1:1 mapping between keys and columns
Warning: 27.63% of input gene IDs are fail to map...
[1] "module 11"
'select()' returned 1:many mapping between keys and columns
Warning: 22.12% of input gene IDs are fail to map...
[1] "module 12"
'select()' returned 1:many mapping between keys and columns
Warning: 16.8% of input gene IDs are fail to map...
[1] "module 13"
'select()' returned 1:many mapping between keys and columns
Warning: 16.98% of input gene IDs are fail to map...
[1] "module 14"
'select()' returned 1:1 mapping between keys and columns
Warning: 42.11% of input gene IDs are fail to map...
# Print combined results
print(all_results)
all_results %>% write_tsv(file = paste(outputDir, 'cms2_cms4_pathway_go.tsv', sep = ""))
all_result_significant <- all_results %>%
filter(p.adjust < 0.05) %>%
arrange(p.adjust)
print(all_result_significant)
library(ggplot2)
library(dplyr)
# Select the top 10 pathways per module
top_results <- all_result_significant %>%
group_by(module) %>%
slice_min(pvalue, n = 10) %>% # Get the top 10 pathways per module
ungroup()
# Convert module to factor for better visualization
top_results$module <- as.factor(top_results$module)
options(repr.plot.width = 14, repr.plot.height = 6) # Wider plot
# Create grouped bar plot
ggplot(top_results, aes(x = reorder(Description, -pvalue), y = -log10(pvalue), fill = module)) +
geom_bar(stat = "identity") +
coord_flip() + # Flip for readability
facet_wrap(~ module, scales = "free_y", ncol = 3) + # Separate plots per module
labs(
title = "Top 10 Enriched Pathways per Module",
x = "Pathway",
y = "-log10(p-value)",
fill = "Module"
) +
theme_minimal() +
theme(
axis.text.y = element_text(size = 11),
axis.title = element_text(size = 11),
legend.position = "right"
) #+
#scale_fill_brewer(palette = "Set3") # Use categorical colors
# Save the plot as a wide image
ggsave(file = paste(outputDir, 'top10_gos_per_module.pdf', sep = ""), width = 18, height = 18)
# Iterated over the list of genes in each module and perform pathway analysis and concatenate the results table
# Iterate over each module and perform enrichment analysis
reactome_results <- map_dfr(df_grouped_string$module, function(module) {
genes <- unlist(df_grouped_string$nodes[df_grouped_string$module == module])
if (length(genes)>10){
print(paste('module',module))
perform_reactome_enrichment(genes, module)
} else {
return(NULL)
}
})
[1] "module 1"
'select()' returned 1:many mapping between keys and columns
Warning: 17.97% of input gene IDs are fail to map...
[1] "module 2"
'select()' returned 1:many mapping between keys and columns
Warning: 11.46% of input gene IDs are fail to map...
[1] "module 3"
'select()' returned 1:many mapping between keys and columns
Warning: 9.57% of input gene IDs are fail to map...
[1] "module 4"
'select()' returned 1:many mapping between keys and columns
Warning: 14.98% of input gene IDs are fail to map...
[1] "module 5"
'select()' returned 1:many mapping between keys and columns
Warning: 9.61% of input gene IDs are fail to map...
[1] "module 6"
'select()' returned 1:many mapping between keys and columns
Warning: 14.55% of input gene IDs are fail to map...
[1] "module 7"
'select()' returned 1:many mapping between keys and columns
Warning: 15.01% of input gene IDs are fail to map...
[1] "module 8"
'select()' returned 1:many mapping between keys and columns
Warning: 28.94% of input gene IDs are fail to map...
[1] "module 9"
'select()' returned 1:many mapping between keys and columns
Warning: 13.97% of input gene IDs are fail to map...
[1] "module 10"
'select()' returned 1:1 mapping between keys and columns
Warning: 27.63% of input gene IDs are fail to map...
[1] "module 11"
'select()' returned 1:many mapping between keys and columns
Warning: 22.12% of input gene IDs are fail to map...
[1] "module 12"
'select()' returned 1:many mapping between keys and columns
Warning: 16.8% of input gene IDs are fail to map...
[1] "module 13"
'select()' returned 1:many mapping between keys and columns
Warning: 16.98% of input gene IDs are fail to map...
[1] "module 14"
'select()' returned 1:1 mapping between keys and columns
Warning: 42.11% of input gene IDs are fail to map...
# Print combined results
print(reactome_results)
reactome_results %>% write_tsv(file = paste(outputDir, 'cms2_cms4_pathway_reactome.tsv', sep = ""))
all_reactome_result_significant <- reactome_results %>%
filter(p.adjust < 0.2) %>%
arrange(p.adjust)
print(all_reactome_result_significant)
library(ggplot2)
library(dplyr)
# Select the top 10 pathways per module
top_results <- all_reactome_result_significant %>%
group_by(module) %>%
slice_min(pvalue, n = 15) %>% # Get the top 10 pathways per module
ungroup()
# Convert module to factor for better visualization
top_results$module <- as.factor(top_results$module)
options(repr.plot.width = 14, repr.plot.height = 6) # Wider plot
# Create grouped bar plot
ggplot(top_results, aes(x = reorder(Description, -pvalue), y = -log10(pvalue), fill = module)) +
geom_bar(stat = "identity") +
coord_flip() + # Flip for readability
facet_wrap(~ module, scales = "free_y", ncol = 3) + # Separate plots per module
labs(
title = "Top 10 Enriched Pathways per Module",
x = "Pathway",
y = "-log10(p-value)",
fill = "Module"
) +
theme_minimal() +
theme(
axis.text.y = element_text(size = 11),
axis.title = element_text(size =11),
legend.position = "right"
) #+
#scale_fill_brewer(palette = "Set3") # Use categorical colors
# Save the plot as a wide image
ggsave(file = paste(outputDir, 'top10_reactomes_per_module.pdf', sep = ""), width = 30, height = 15, dpi = 300)
# Iterate over each module and perform enrichment analysis
all_kegg_results <- map_dfr(df_grouped_string$module, function(module) {
genes <- unlist(df_grouped_string$nodes[df_grouped_string$module == module])
if (length(genes)>10){
print(paste('module',module))
perform_kegg_enrichment(genes, module)
} else {
return(NULL)
}
})
[1] "module 1"
'select()' returned 1:many mapping between keys and columns
Warning: 17.97% of input gene IDs are fail to map...Reading KEGG annotation online: "https://rest.kegg.jp/link/hsa/pathway"...
Reading KEGG annotation online: "https://rest.kegg.jp/list/pathway/hsa"...
[1] "module 2"
'select()' returned 1:many mapping between keys and columns
Warning: 11.46% of input gene IDs are fail to map...
[1] "module 3"
'select()' returned 1:many mapping between keys and columns
Warning: 9.57% of input gene IDs are fail to map...
[1] "module 4"
'select()' returned 1:many mapping between keys and columns
Warning: 14.98% of input gene IDs are fail to map...
[1] "module 5"
'select()' returned 1:many mapping between keys and columns
Warning: 9.61% of input gene IDs are fail to map...
[1] "module 6"
'select()' returned 1:many mapping between keys and columns
Warning: 14.55% of input gene IDs are fail to map...
[1] "module 7"
'select()' returned 1:many mapping between keys and columns
Warning: 15.01% of input gene IDs are fail to map...
[1] "module 8"
'select()' returned 1:many mapping between keys and columns
Warning: 28.94% of input gene IDs are fail to map...
[1] "module 9"
'select()' returned 1:many mapping between keys and columns
Warning: 13.97% of input gene IDs are fail to map...
[1] "module 10"
'select()' returned 1:1 mapping between keys and columns
Warning: 27.63% of input gene IDs are fail to map...
[1] "module 11"
'select()' returned 1:many mapping between keys and columns
Warning: 22.12% of input gene IDs are fail to map...
[1] "module 12"
'select()' returned 1:many mapping between keys and columns
Warning: 16.8% of input gene IDs are fail to map...
[1] "module 13"
'select()' returned 1:many mapping between keys and columns
Warning: 16.98% of input gene IDs are fail to map...
[1] "module 14"
'select()' returned 1:1 mapping between keys and columns
Warning: 42.11% of input gene IDs are fail to map...
# Print combined results
print(all_kegg_results)
all_kegg_results[0] %>% write_tsv(file = paste(outputDir, 'cms2_cms4_pathway_kegg.tsv', sep = ""))
all_kegg_result_significant <- all_kegg_results %>%
filter(p.adjust < 0.2) %>%
arrange(p.adjust)
print(all_kegg_result_significant)
library(ggplot2)
library(dplyr)
# Select the top 10 pathways per module
top_results <- all_kegg_result_significant %>%
group_by(module) %>%
slice_min(pvalue, n = 15) %>% # Get the top 10 pathways per module
ungroup()
# Convert module to factor for better visualization
top_results$module <- as.factor(top_results$module)
options(repr.plot.width = 14, repr.plot.height = 6) # Wider plot
# Create grouped bar plot
ggplot(top_results, aes(x = reorder(Description, -pvalue), y = -log10(pvalue), fill = module)) +
geom_bar(stat = "identity") +
coord_flip() + # Flip for readability
facet_wrap(~ module, scales = "free_y", ncol = 5) + # Separate plots per module
labs(
title = "Top 10 Enriched Pathways per Module",
x = "Pathway",
y = "-log10(p-value)",
fill = "Module"
) +
theme_minimal() +
theme(
axis.text.y = element_text(size = 11),
axis.title = element_text(size = 11),
legend.position = "right"
) #+
#scale_fill_brewer(palette = "Set3") # Use categorical colors
# Save the plot as a wide image
ggsave(file = paste(outputDir, 'top10_keggs_per_module.pdf', sep = ""), width = 25, height = 10, dpi = 300)
library(readr)
library(dplyr)
library(tidyr)
library(ggplot2)
library(pheatmap)
kegg = all_kegg_results
# Drop rows where subcategory is NA
kegg <- kegg %>%
filter(!is.na(subcategory))
# Filter significant terms (p.adjust < 0.05) and count
ktemp <- kegg %>%
filter(p.adjust < 0.05) %>%
count(subcategory, module)
library(readr)
library(dplyr)
library(tidyr)
# Pivot wider
ktemp_wide <- ktemp %>%
pivot_wider(names_from = module, values_from = n, values_fill = 0)
# Set rownames and remove 'subcategory' column
ktemp_mat <- ktemp_wide %>%
column_to_rownames('subcategory')
library(viridis)
Loading required package: viridisLite
# Important: Make sure all columns are numeric (they are currently character!)
ktemp_mat <- ktemp_mat %>%
mutate(across(everything(), as.numeric)) %>%
as.matrix()
# --- Masking zeros: make them NA ---
ktemp_mat_masked <- ktemp_mat
ktemp_mat_masked[ktemp_mat_masked == 0] <- NA
# --- Sort rows based on increasing order of first module (column "1") ---
# (if you want to sort by a specific module index)
module1 <- "1" # choose the module you want to sort by
if (module1 %in% colnames(ktemp_mat_masked)) {
ktemp_mat_masked <- ktemp_mat_masked[order(ktemp_mat_masked[, module1], decreasing = TRUE, na.last = TRUE), ]
}
# --- Plot heatmap ---
pheatmap(
ktemp_mat_masked,
color = viridis(100), # White color for NA (masked)
cluster_rows = FALSE,
cluster_cols = FALSE,
angle_col = 45, # Slight tilt to x labels
border_color = "grey90",
fontsize_row = 9,
fontsize_col = 10,
legend_breaks = c(1, 3, 5, 7, 10),
main = "Significant KEGG Terms",
cellwidth = 20, # Force cells to be square
cellheight = 20 # Force cells to be square
)
# --- Pivot longer for ggplot ---
ktemp_long <- ktemp %>%
mutate(module = as.factor(module)) # Make module categorical
# --- Sort subcategories based on NAME alphabetically ---
sorting_order <- ktemp_long %>%
distinct(subcategory) %>% # get unique subcategories
arrange(subcategory) %>% # sort alphabetically
pull(subcategory)
# Then use it to reorder
ktemp_long <- ktemp_long %>%
mutate(subcategory = factor(subcategory, levels = sorting_order))
ktemp_long <- ktemp_long %>%
mutate(subcategory = factor(subcategory, levels = sorting_order))
# Create heatmap
p <- ggplot(ktemp_long, aes(x = module, y = subcategory, fill = n)) +
geom_tile(color = "white", linewidth = 0.5) +
scale_fill_viridis(option = "D", na.value = "white", name = "Significant terms") +
theme_minimal(base_size = 12) +
theme(
axis.text.x = element_text(angle = 0, vjust = 0.5, hjust=0.5),
axis.text.y = element_text(hjust = 1),
panel.grid = element_blank(),
axis.title = element_blank(),
aspect.ratio = 2, # <-- force squares
plot.title = element_text(hjust = 0.5),
axis.text = element_text(size = 8) # smaller axis text for fitting
) +
labs(title = "Significant KEGG Terms by Module")
# Show the plot
print(p)
# Save it, set exact square dimensions (e.g., 7x7 inches)
ggsave(
filename = "kegg_heatmap_square.pdf",
plot = p,
width = 7, # Adjusted to 7x7 for better aspect ratio
height = 7, # Ensuring it's square
units = "in",
dpi = 300
)